Search results for "web page"
showing 10 items of 66 documents
Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation
2020
Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation. Resilience to spelling mistakes and typos, however, is crucial as machine translation systems are used to translate texts of informal origins, such as chat conversations, social media posts and web pages. We propose a simple generative noise model to generate adversarial examples of ten different types. We use these to augment machine translation systems’ training data and show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data, while baseline systems’ performance drops by…
Methods for defining user groups and user-adjusted information structures
1999
A common problem in the design of information systems is how to structure the information in a way that is most useful to different groups of users. This paper describes some statistical methods for revealing the structure inherent in empirical data elicited from users. It is illustrated by the application of these methods to the design of some web pages giving information about the Universitat de Valencia. Three potential user groups were identified, administrative staff, teaching staff and students. The first analysis demonstrated that users within these three groups assign relatively homogeneous structures, but that the structures assigned by the three groups are not the same, and also, …
Security Implications of Using Third-Party Resources in the World Wide Web
2018
Modern web pages have nothing in common with the static connotation coming from the word “page” - it is a dynamic unique experience created by active content and executed within browser, just-in-time assembled from various resources hosted on many different domains. Active content increases attack surface naturally exposing users to many novel threats. A popular security advice has been to deploy active content blocker plugins like NoScript, unfortunately they are not capable to effectively stop the attacks. Content Security Policy (CSP) can be effective against these attacks, but we demonstrate how poor decisions made by website administrators or external resource hosters can render CSP in…
Usability analysis and visualization of Web 2.0 applications
2008
Nowadays companies and home users use Web sites offering services ranging from Web sites up to complex Web applications. Often the ergonomics of these applications remains unconsidered and they turn out to be hard to use. In order to examine the usability from within the Web applications, information about the usage of the application is collected. The techniques that have been used in the past for Web 1.0 are no longer adequate. Ajax programs (Web 2.0) are more flexible and require other techniques. This paper shows techniques for collection, analysis, processing and visualization of data for Web 2.0 applications.
Applying the ReMiP to Web Site Migration
2007
Web sites serve to publish information, both locally in intranets as well as on a global scale. Like all software systems, they have to cope with changing requirements and evolving technologies. The reference process model for software migration, ReMiP, provides a generic process model for software migration in general. The paper introduces ReMiP and summarises the application of a tailored ReMiP towards migrating a static HTML-based Web site to a content management system.
What you see is what you get? Measuring companies' projected employer image attributes via companies' employment webpages
2021
Information on a company's employment webpage sends signals about the employer image the company intends to project to applicants. Nonetheless, we know little about the content of recruitment signals sent via company employment webpages. This study develops a method to measure companies’ projected employer image attributes based on their employment webpages. Specifically, we analyze companies’ projected employer image attributes by applying computer‐aided text analysis (CATA) to the employment webpages of 461 Fortune 500 companies (i.e., more than 11,100 individual pages). Our results show that projected employer image attributes remain relatively stable over time. Moreover, we find relativ…
Processing and learning from multiple sources: A comparative case study of students with dyslexia working in a multiple source multimedia context
2019
This study investigated how four 10th-grade students with dyslexia processed and integrated information across web pages and representations when learning in a multiple source multimedia context. Eye movement data showed that participants’ processing of the materials varied with respect to their initial exploration of the web pages, their overall processing time, and the linearity of their processing patterns, with post-learning interviews indicating the deliberate, strategic considerations underlying each participant’s processing pattern. Eye movement data in terms of fixation duration and percentage of regressions also corroborated the findings of formal, diagnostic assessments. Finally, …
DBSCAN Algorithm for Document Clustering
2019
Abstract Document clustering is a problem of automatically grouping similar document into categories based on some similarity metrics. Almost all available data, usually on the web, are unclassified so we need powerful clustering algorithms that work with these types of data. All common search engines return a list of pages relevant to the user query. This list needs to be generated fast and as correct as possible. For this type of problems, because the web pages are unclassified, we need powerful clustering algorithms. In this paper we present a clustering algorithm called DBSCAN – Density-Based Spatial Clustering of Applications with Noise – and its limitations on documents (or web pages)…
A web search methodology for different user typologies
2009
Search engines and directories are the main tools used to find desired information into the ocean of digital contents that is the Web. However, they are not presently able to understand the user specific needs and starting knowledge because their inability to simulate the processes of human mind. Natural Language Processing, Folksonomy, Semantic Web and Serendipitous Surfing are some of the recent research fields towards understanding of human natural language and in general of real user needs. This work aims to add one step more to this evolution path by presenting a new search methodology that allows users to create new knowledge paths on the web based on their specific requirements. Thus…
Using Internet videos to learn about controversies: Evaluation and integration of multiple and multimodal documents by primary school students
2020
Abstract In many Internet videos authors appear in front of the camera to present their particular view on a topic. Given the high consumption rate of Internet videos by teenagers, we explored the pros and cons of using these videos to learn about complex topics, compared to learning from textual web pages. Specifically, we studied how 207 primary school students (grades 4–6) evaluated and integrated multiple and multimodal web pages (text or video) while learning about the pros and cons of bottled water. Results showed no major role of modality in students' source memory, as measured by citations in their responses to an integration question and their memory for sources. Nevertheless, moda…